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Abstract 


In this paper we investigate properties of retiming, a circuit transformation which 
preserves the behavior of the circuit as a whole. We present an algorithm which 
transforms a given combinational circuit into a functionally equivalent pipelined cir- 
cuit with minimum latency and clock-period no greater than a given upper bound c. 
The algorithm runs in O(F) steps, where EF is the number of interconnections in 
the circuit, and is optimal within a constant factor. We give a novel and concise 
characterization of the minimum clock-period of a circuit in terms of the maximum 
delay-to-register ratio cycle in the circuit. We show that this ratio does not exceed the 
minimum feasible clock-period by more than the maximum delay D of the elements in 
the circuit. This characterization leads to an O( F lg D) algorithm for minimum clock- 
period pipelining of combinational circuitry with latency no greater than a given up- 
per bound J, an O(min{V'/?Elg(V D), VE}) algorithm for minimum clock-period re- 
timing of unit-delay circuitry, an O(VE lg D) algorithm for minimum clock-period 
retiming of general circuitry and an O(min{V!/?Elg(VW) lg(VD), VE lg(VD)}) al- 
gorithm for approximately minimum clock-period retiming, where V is the number of 
processing elements in the circuit. We demonstrate the closed semiring structure of 
retiming on unit-delay circuits under a given clock-period constraint. Finally, we give 
an O(V?lg¢V) algorithm for a mixed-integer optimization problem which arises in the 
linear programming framework of retiming. 
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Chapter 1 


Introduction 


Speed of design is essential in building large-scale, highly-complex systems. This 
issue becomes more apparent, since emerging VLSI technologies lead to systems of 
increasing size and complexity. Design automation accelerates the design process by 
providing tools that improve the quality of a quickly designed circuit. Retiming, which 
was introduced in [13, 14, 15] and treated in [17], is a well-known design automation 
technique which aims at speeding the design process, without sacrificing the quality 
of the implementation. Retiming optimizes clocked circuits by relocating registers so 
as to reduce combinational rippling. In this thesis we further investigate retiming and 
provide results of practical as well as theoretical interest. We present optimal algo- 
rithms for optimization of combinational circuitry. We give a novel characterization of 
the minimum clock-period of a circuit in terms of the maximum register-to-delay ratio 
cycle in the circuit, which leads to improved algorithms for minimum clock-period 
and approximately minimum clock-period retiming. We exhibit the group theoretical 
structure of retiming on circuits with unit-delay components. Finally, we give an ef- 
ficient algorithm for a mixed-integer optimization problem, which arises in the linear 
programming framework of retiming. 

In Chapter 2 we introduce the basic concepts of retiming. We define the notations 
and terminology and review the graph-theoretic model of digital circuits from [15, 17]. 
We give an algorithm that transforms a given combinational circuit into a functionally 
equivalent pipelined circuit with minimum latency and clock-period no greater than 
a given upper bound c. The algorithm runs in O(£) steps, where E is the number of 


interconnections in the circuit, and is optimal within a constant factor. The operation 
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of the algorithm is based on the notion of accumulated delay along a path in the circuit. 


In Chapter 3 we give a novel and concise characterization of the minimum feasible 
clock-period of a circuit in terms of the maximum delay-to-register ratio cycle in the 
circuit graph. We prove that this ratio does not exceed the minimum feasible clock- 
period by more than an additive factor of D, where D is the maximum delay of the 
processing elements in the circuit. This observation establishes a range of possible 
values for the minimum clock-period, that is independent of the size of the circuit. 
The range depends solely on the delays of the individual components used. 

Based on the maximum ratio cycle characterization of the minimum clock-period 
we approach a variety of retiming problems. For combinational circuits we give an 
optimal O(£) algorithm, that transforms a unit-delay combinational circuit into a 
pipelined circuit with minimum clock-period and latency no greater than a given upper 
bound |. We also give a more general O(£ lg D) algorithm for the same problem on 
combinational circuits with arbitrary delays. We show how to obtain a minimum 
clock-period retiming of a unit-delay circuit in O(min{V!/*Elg(VW),VE}) steps, 
where V is the number of processing elements in the circuit and W is the maximum 
number of registers on a wire in the circuit, by direct application of graph-theoretic 
algorithms for finding the minimum cycle mean in a graph [11, 20]. We demonstrate 
how to obtain a minimum clock-period retiming of a circuit with arbitrary delays 
in O(VElg D) steps. The best previously known strongly polynomial algorithm for 
minimum clock-period retiming of synchronous circuitry, unit-delay or arbitrary-delay, 
required O(V E lg V) steps [17]. Finally, if the retimed circuit is allowed a clock-period 
which does not exceed the minimum possible by more than D we show how to obtain 
it in O(min{V1/*E le(VW) le(VD),VElg(VD)}) steps. The running times of the 
algorithms in Chapters 2 and 3 are summarized in the table of Figure 1.1. 

In Chapter 4 we investigate group-theoretic properties of retiming. We demon- 
strate the closed semiring structure of retiming on unit-delay circuits and we give a 
Bellman-Ford type algorithm, with redefined additive and multiplicative operations, 
for unit-delay circuitry retiming. Its running time is O(VE) and matches the best 
previously known strongly polynomial algorithm for the same problem [17]. 

In Chapter 5 we investigate a mixed-integer optimization problem, which arises in 


the linear programming framework of retiming. We give a polynomial time algorithm 
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Transformation 
Min Latency Pipelining 
Min Clock-Period 
Min Clock-Period 


Combinational 
UD Combinational 


Combinational 


O(E) 
O(E) 
O(Elg D) 


O (min { ViPElg(V ) \) 
VE 
O(VElg D) 


V1/2E lg(VW) le(VD) 
VElg(VD) 


UD Sequential Min Clock-Period 


Sequential Min Clock-Period 


Sequential Approx Min Clock-Period | O { min 


Figure 1.1: Summary of problems and running times of corresponding algorithms. For 
the sake of simplicity we denote a set S and its cardinality |.S| by the same symbol. 
The initials UD denote unit-delay circuitry. 


for a generic mixed-integer optimization problem, that we call restricted mixed-integer 
dual of an uncapacitated minimum-cost flow. The polynomial running time is achieved 
by introducing a set of additional, appropriately chosen constraints. The same idea 
was used for the solution of a similar mixed-integer problem in [22, 16], which did 
not involve, however, an objective to be optimized. The technique of introducing 
additional constraints, or cuts as they are known in the literature, in order to solve 
mixed-integer optimization problems, is known in general to require an exponential 
number of steps [21, 23, 3, 18]. Aharoni, Erdés and Linial [1] pose the question 
whether a clever choice of cuts can yield polynomial time algorithms. We show that 
this is possible for the problem we consider, by choosing the cuts in a way that reduces 


the original mixed-integer problem to a network flow problem. 


Chapter 2 


Minimum Latency Pipelining 


In this chapter we review the basic concepts of retiming and describe an O(E) algo- 
rithm for minimum latency pipelining of combinational circuitry. The running time 
of the algorithm is optimal within a constant factor. The chapter is organized as fol- 
lows. Section 2.1 defines the terminology used in the rest of the paper and presents 
a mathematical framework of retiming. Section 2.2 gives a brief overview of the re- 
lation between the problem of satisfying a given set of difference constraints and the 
problem of finding single-source shortest-paths in a graph. This relation serves as a 
basis for proving the correctness of our algorithm for minimum latency pipelining of 
combinational circuitry. Both the algorithm and its correctness proof are given in 


Section 2.3. 


2.1 Preliminaries 


In this section we define the notations and terminology needed in the rest of the paper 
and present the graph-theoretic model of digital circuits assumed. We also describe 
the operation of retiming and present a mathematical framework for it. The entire 
framework presented in this section was introduced in [13, 14, 15] and was treated 
thoroughly in [17]. 

We view a circuit abstractly as a network of functional elements and globally 
clocked registers. The functional elements provide the computational power of the 
circuit and the registers act as storage elements. Each functional element has an 


associated propagation delay. The outputs of a functional element at any time are 
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defined as a specified function of its inputs, provided that all the inputs have been 
stable for a time at least equal to the element’s propagation delay. 

We model a circuit as a finite, vertex-weighted, edge-weighted, directed multigraph 
G = (V, E,d,w). The vertices of the graph model the functional elements of the circuit. 
Each vertex v is weighted with its numerical propagation delay d(v). The directed 
edges F of the graph model interconnections between functional elements. Each edge 
u—v € E connects an output of some functional element represented by vertex u to 
an input of some functional element represented by vertex v. Each edge e is labeled 
with a register count w(e), which equals the number of registers along the connection. 
We impose the restriction that there be no directed cycles in G of zero edge-weight, 
thereby ensuring that no race conditions can arise. We define the clock-period @(G) 
for any synchronous circuit G as the maximum amount of propagation delay through 
which any signal must ripple between clock ticks. 

We shall view a simple path p = u~> v in G as a sequence of vertices and edges, 
with no repetitions, that starts from a vertex u and ends at a vertex v. For any path 
p= vo 8g ee fans vp, we define the path weight as the sum of the weights of the 


edges of the path: 


k-1 
wp) = s> w(e;). 
1=0 
We also define the path delay as the sum of the delays of the vertices of the path: 
k 
d(p) = >> d(vj). 
1=0 


In order that a graph G = (V,F,d,w) have well-defined physical meaning as a 
circuit, we place the restriction that the propagation delays d(v) and the register 
counts w(e) are nonnegative integers for each vertex v € V and for each edge e € E. 

Retiming transformations alter the clock-period of a circuit by inserting and delet- 
ing registers, but without otherwise affecting the circuit’s structure. The new circuit is 
functionally equivalent, as seen by the external world, to the original. Such a proof can 
be found in [15], which also contains a technical definition of the term “equivalent”. 
A retiming of a circuit G = (V, £,d,w) is an integer-valued vertex-labeling r: V — 
Z. The retiming specifies a transformation of the original circuit in which registers are 


added and removed so as to change the graph G into a new graph G, = (V, E,d,w,). 


14 CHAPTER 2. MINIMUM LATENCY PIPELINING 


The edge-weighting w, is defined for an edge u -> v by the equation 
wee) = w(e) + r(u) — (0), 


and the label r(v) is referred to as the lead of vertex v. A retiming r of a circuit is 
legal if the register counts w, of the retimed circuit G, are nonnegative, thus ensuring 
that no edge may have a negative register count. 

In order to characterize the clock-period of a retimed circuit we define two quan- 


tities: 


W(u,v) = min{w(p):u~ ov}, 
D(u,v) = max{d(p):u~% v and w(p) = W(u, v)}. 


The quantity W(u, v) is the minimum number of registers on any path from vertex u 
to vertex v. We call a path u~+ v such that w(p) = W(u, v) a critical path from u to 
v and we denote it by u 2 v. The quantity D(u, v) is the maximum total propagation 
delay on any critical path from u to v. 


The following two statements about D are important: 
Fi D(u,v) can take on O(V*) values. 


F2 Given a synchronous circuit G and a retiming r of G, the clock-period ®(G,) 


is equal to D(u,v) for some u,v EV. 


Statements F1 and F2 are easily justified by the fact that there are O(V*) pairs of 
vertices in the graph and that retiming does not change the propagation delay along 
a critical path between any two vertices in the graph. 

We can compute W and D by solving an all-pairs shortest-paths problem in G. 
Common ways of solving this problem are the Floyd-Warshall method [12, page 86], 
which runs in O(V?) and Johnson’s algorithm [10], which runs in O(VE + V7lgV) 
time using the Fibonacci heap data structure due to Fredman and Tarjan [6]. 

The following theorem, which is proven in [17], characterizes the conditions under 
which a retiming produces a circuit whose clock-period is no greater than a given 


constant. 
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Theorem 2.1 Let G = (V,E,d,w) be a synchronous circuit, let c be an arbitrary 
positive real number, and let r be a function from V to the integers. Then r is a legal 


retiming of G such that ®(G,) < c if and only if 


r(v) — r(u) < w(e) (2.1) 

for every edge u — v of G, and 
r(v)— r(u) < W(u,v)-1 (2.2) 
for all vertices u,v € V such that D(u,v) > c. oO 


This theorem provides the basic tool needed to solve the retiming problem for a 
given clock-period. Notice that the constraints on the unknowns r(v) in the theorem 
are linear inequalities involving only differences of unknowns. Using the Bellman-Ford 
algorithm [12, page 74] we can test whether there exists a retimed circuit with clock- 
period less than some constant c in O(V*) steps, since there can be O(V*) inequalities 
of the form (2.1). Leiserson and Saxe [17] give an asymptotically faster algorithm, 


which runs in O(V FE) steps. 


2.2 Difference Constraints and Shortest-Paths 


In this section we exhibit the relation between the problem of satisfying a given set of 
difference constraints and the problem of finding single-source shortest-paths in a graph 
generated by the given set of constraints. We also give without proof an important 
property of the single-source shortest-paths solution [12, 4]. The framework, that we 
develop in this section, will be used extensively in the rest of this thesis. 


We consider the problem of solving the following system of difference constraints. 


Problem DC (Difference Constraints) Let S be a set of m linear constraints of the 
form 

ee ty fay (S) 
on the n unknowns 2), 2%2,...,%,, where @;; are given real constants. Determine a set 


of feasible values for the unknowns z; or determine that no such set exists. Oo 
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The given system S$ induces an edge-weighted graph G = (V, EF, w). The vertex set 
V is defined as 


V = {v : zy is an unknown of S}. 


The edge set EF is defined as 
E={u-vu: ty— Ly < dy is a constraint of S}. 
Finally, for every edge u — v € E we have 
we) = ayy. 


Now, we define the single-source shortest-paths problem on an edge-weighted graph 


G =(V, E,w) from a source-vertex s € V. 


Problem SSSP (Single-Source Shortest-Paths) Let G = (V, E, w) be an edge-weighted 
graph and let s be a vertex in V. Determine a value I(v) for each vertex v € V such 
that 

I(v) = min{w(p) : 3% v}. Oo 


We give without proof three important lemmata [12]. 


Lemma 2.2 Problem DC is solvable if and only if Problem SSSP is solvable. gO 


Lemma 2.3 Problem SSSP is solvable if and only if there exist no directed cycles C 
inG with weight w(C) < 0. oO 


Lemma 2.4 Let S be a system of m difference constraints of the form 
Lj 2; Sai; 


on the n unknowns 21,%2,...,2n, where aj; are given real constants. Let G = (V, E,w) 
be the graph induced by S, and let I(v) be the length of the shortest path in G from 
the source s € V to vertex v. Then the assignment x, = I(v) for each verter v € V 


satisfies the constraints in S and maximizes t, — 25 for every verter v EV. Oo 


These three lemmata will be used extensively in the correctness proofs of the 


algorithms that we present in the rest of the thesis. 
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2.3 The Algorithm 


This section introduces the problem of minimum latency pipelining of combinational 
circuitry and presents an efficient algorithm for its solution. The algorithm terminates 
in O(E) steps, and its performance is optimal within a constant factor. Its running 
time is a significant improvement over the O(VE) running time of the previously 
known techniques for the general retiming problem. 

In a combinational circuit all register counts are zero and thus the circuit graph is 
acyclic. We consider the circuit to have one input interface vy and one output interface 
vo. By retiming a combinational circuit G, we can produce a pipelined circuit G, which 
achieves a shorter clock-period at the cost of introducing a latency of r(vr) — r(vo) 
clock ticks for signals to propagate from the input interface vy to the output interface 
vo. 

The problem of minimum latency pipelining is defined as follows: Given a combi- 
national circuit G = (V,E,d,0) with input interface vy and output interface vg, and 
a positive integer c, find a legal retiming r of G such that ®(G,) < ¢ and the latency 
r(vz) — r(vg) of the retimed circuit is as small as possible. Stated in mathematical 


terms, we want to solve the following problem: 


Problem MLP (Minimum Latency Pipelining) Given a combinational circuit 
G = (V,£,d,0) with input interface v; and output interface vg, determine a value 


r(v) for each vertex v € V that minimizes r(vyz) — r(vo) subject to 


r(v) — r(u) <0 (2.3) 
for every edge u > v € E, and 

r(v)—r(u) < -1 (2.4) 
for all vertices u,v € V such that D(u,v) > ce. Oo 


According to Section 2.2, Problem MLP can be viewed as a single-source shortest-paths 
problem on the constraint graph G, = (V., Ee, We), which is defined in the following 


manner. 


18 CHAPTER 2. MINIMUM LATENCY PIPELINING 


E, = {uv : r(v)—r(u) is constrained by (2.3) or (2.4)}, 
7 0 if r(v)— r(u) is constrained by (2.3), 
a ee —1 if r(v) —r(u) is constrained by (2.3). 


A feasible assignment of values to the unknowns of Problem MLP can be obtained in 
O(VE) steps by using the general techniques described in Section 2.1. 

We present Algorithm MLP, which yields a solution to Problem MLP in O(E) 
steps. The running time of the algorithm is optimal within a constant factor. For each 
vertex v in the graph, Algorithm MLP maintains its stage |r(v)| and its accumulated 
delay 6(v). The stage of a vertex v is the number of registers along any path from the 
input interface vy; to the vertex v. The accumulated delay of a vertex v is the longest 
delay of a signal coming into that vertex from a preceding register. The algorithm 


operates as follows: 


Algorithm MLP (Minimum Latency Pipelining) Given a combinational circuit G 
and a desired clock-period c, this algorithm determines a pipelined combinational 


circuit G, with clock-period ®(G,) < ¢ and minimum latency. 
1. For each vertex v € V, set r(v) — 0 and 6(v) — d(v). 
2. Visit the edges u — v in topological sort order. For each edge u — v do: 


2.1. If r(v) > r(u), then r(v) — r(u). 
2.2. If 6(u)+d(v) > ¢ and r(v) > r(u), then r(v) — r(u)— 1. 
2.3. If d(u)+d(v) > 6(v) and r(u) = r(v), then 6(v) — 6(u) + d(v). 


3. For each edge u = v € E, set w,(e) = w(e) + r(u) — r(v). Oo 


The idea behind Algorithm MLP is to visit the vertices of the graph keeping track 
of the longest propagation delay up to the vertex currently visited. New registers 
are introduced according to a greedy criterion: whenever the longest propagation 
delay exceeds the desired clock-period c, a pipeline stage is introduced. Visiting the 
edges in topological sort order ensures that whenever an edge is considered all the 
preceding edges in the graph have been taken into account. Step 2.1 of the algorithm 


ensures that no succeeding vertex belongs to a higher pipeline stage. Step 2.2 ensures 
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that whenever the longest propagation delay along a register-free path leading from a 
preceding vertex to the currently visited vertex exceeds the desired clock-period c, a 
new pipeline stage is introduced. Finally, step 2.3 ensures that once all the incoming 
edges of a vertex v have been processed, the maximum propagation delay along a 
register-free path leading from a preceding vertex to vertex v is maintained. 
Algorithm MLP terminates, since the number of edges is finite and it executes a 
finite number of operations per edge. In fact the algorithm runs quickly, as is shown 


by the following lemma. 
Lemma 2.5 Algorithm MLP terminates in O(E) steps on a circuit G = (V, E,d,0). 


Proof: Steps 1 and 3 require O(/ + V) steps. Sorting the edges of a directed acyclic 
graph in topological order requires O(£) time [4]. In step 2 each edge is visited 
exactly once and the number of operations is bounded by a constant. By the time the 


algorithm terminates, therefore, it has executed O(£) steps, assuming V< E-1. O 


In order to demonstrate the correctness of Algorithm MLP we proceed in two 
stages. First we show that Algorithm MLP yields a set of values for r(v) that sat- 
isfies (2.3) and (2.4). Then, we show that this set is a single-source shortest-paths 
solution in the constraint graph G,, thereby ensuring, according to Lemma 2.4, that 
r(vo) — r(vr) is maximized. It follows directly that the set of values r(v) is a legal 


retiming that minimizes the latency r(v;) — r(vo). 
Lemma 2.6 Algorithm MLP yields a solution that satisfies (2.3). 


Proof: Steps 2.1 and 2.2 of the algorithm change r(v). Both steps ensure that r(v) is 


only decreasing for every edge u — v in E. Q 
Lemma 2.7 Algorithm MLP yields a solution that satisfies (2.4). 


Proof: Assume for the sake of contradiction that for some pair of vertices (uo, uz), 
there exists a path p = up % uy + ... = Uk-1 ims uz, in G with propagation 
delay d(p) > ¢ such that r(u,) — r(uo) > —1 or, equivalently, r(uo) < r(ux). The 


inequality r(ug) < r(ug) and transitive application of inequality (2.3) imply that 


r(u;) = r(u;) for all vertices u;,u; € p. In this case step 2.3 of the algorithm ensures 
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that 6(ux—-1) + d(ux) > d(p) which in turn implies that 6(uz-1) + d(ux) > c. When 
visiting vertex u, the algorithm detects this condition in step 2.2 and enforces r(u,) > 


r(ux—1), Which contradicts the fact that r(u;) = r(u;) for all vertices u;,uj € p. Oo 


In order to show that the values r(v) given by Algorithm MLP are a single-source 
shortest-paths solution in the constraint graph G,, we must prove two basic lemmata 


first. 
Lemma 2.8 At any point of the algorithm, we have d(v) < 6(v) <c. 


Proof: The relation d(v) < 4(v) clearly holds at any point of the algorithm, since 
initially d(v) = 6(v) and 6(v) is never decreased. 

Now, for the second part of the inequality, observe that 6(v) increases in step 2.3 
only. For the sake of contradiction assume that for some edge u — v the relation 
6(v) > c holds after the execution of step 2.3. It follows that the preconditions 
6(v) = 6(u) + d(v) > c and r(u) = r(v) of step 2.3 must have been satisfied prior to 
its execution. But, from the immediately previous step 2.2 we have that r(u) > r(v), 


since 6(u) + d(v) > c, which contradicts the fact that r(w) = r(v). Oo 


Lemma 2.9 For every vertex v that has had all its incoming edges visited by the 
algorithm we have A(v) < elr(v)| + 6(v), where A(v) denotes the mazimum possible 


delay from vz to v along any path in G. 


Proof: The proof is by induction on the vertices that have had all their incoming 
edges visited by the algorithm. Initially, vertex vy has had all its incoming edges 
trivially visited by the algorithm, since the indegree of vy; is zero, and r(vz) = 0. 
Since the longest path from v7 to itself is the trivial path with no edges, we infer that 
A(vyz) = d(vz). Also, we have that 6(v;) = d(vy). Therefore A(vz) < elr(vz)| + (v7) 
holds. 

Now, consider the inductive step. Since the edges are visited in topological sort 
order, whenever all tle incoming edges of a vertex v have been visited all the incoming 
edges of the vertices u; with edges u; — v have been visited as well. Assume for the 
sake of contradiction that A(v) > c|r(v)| + 6(v) holds after having visited all the 


incoming edges u; — v of vertex v. Then, we have: 


d(v) + max{A(u;) : uv € E}> elr(v)| + 6(v), (2:5) 
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which implies 
d(v) + max{c|r(u;)| + 6(u;) : ui sv € E} > elr(v)| + 6(v), (2.6) 
since A(u;) < clr(u;)| + 6(u;) by the inductive assumption. Let the maximum in the 
left hand side of (2.5) occur for i = i’. Now, consider the three possible orderings of 
r(uy) and r(v): 
Case 1: r(uj) < r(v). This case is impossible, because steps 2.1 and 2.2 of the 
algorithm ensure that r(v) can only decrease. 


Case 2: r(ujy) = r(v). In this case we have from (2.6): 


elr(v)| + d(v) < d(v)+max{elr(u,)| + 6(u;) : uj > v € E} 
dv) + elr(uy)| + 6(us) 

dv) + elr(v)] + 6(ui) 

d(v) + clr(v)|+ max{6(u;) : uj > v € EB} 


lA 


which implies that 
6(v) < d(v) + max{d(uj) : uj v € EF}. 


But from step 2.3 we have d(v) + max{6(uj) : uy > v € E} = 4(v), which is a 


contradiction. 


Case 3: r(ujy) > r(v). In this case we have |r(v)| > |r(uj)| + 1, which implies 


elr(v)| elr(uj)| + ¢ 


IV 


IV 


elr(i)| + 8(ui) 


max{e|r(u;)| + 6(uj) : uj av € E}. 
Since 6(v) > d(v) from Lemma 2.8, the last inequality implies 
e|r(v)| + 6(v) > d(v) + max{e|r(u,)| + 6(uj) : uz; > v € E}, 


which contradicts inequality (2.6). gO 


Now, using Lemma 2.9 we can prove that the values r(v) given by Algorithm 
MIP are the lengths of the shortest-paths in the constraint graph G, from the input 


interface vy. 
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Lemma 2.10 Let I(v) be the length of the shortest path in G, from vj to v. Then 
Algorithm MLP sets r(v) = I(v). 


Proof: Assume for the sake of contradiction that the length l(v) of the actual shortest 
path p in G, satisfies I(v) < r(v) < 0. This inequality implies that p traverses at least 
one —1 edge more than the shortest path indicated by Algorithm MLP. Consequently, 
there exists a path p from v; to v in G with propagation delay d(p) such that d(p) > 
cll(v)| + 1, since d(v) > 1 for every vertex v € V. From Lemma 2.9, however, the 


maximum possible delay A(v) from v; to v along any path in G satisfies 


A(v) < elr(v)|+ 6(v) 
< elr(v)|te 
< e(l(v)|- 1 $e 
= ei(v)| 
< efl(v)|+1, 
implying A(v) < d(p), which is a contradiction. Oo 


Combining lemmata 2.4, 2.6, 2.7 and 2.10, we obtain the following theorem. 
Theorem 2.11 Algorithm MLP correctly solves Problem MLP. a 


This theorem completes the correctness proof of Algorithm MLP. 

In summary, in this chapter we presented and proved the correctness of a greedy 
strategy for pipelining combinational circuitry. The clock-period of the pipelined cir- 
cuit is guaranteed not to exceed a specified upper bound c and its latency is guaranteed 
to be minimal under the given clock-period constraint. The running time of the al- 
gorithm is directly proportional to the number of interconnections in the circuit and 
is optimal within a constant factor. The given procedure is used extensively as a 


subroutine of the algorithms in the following section. 


Chapter 3 


Minimum Clock-Period 
Characterization 


In this chapter we give a concise characterization of the minimum feasible clock-period 
of a circuit in terms of the maximum delay-to-register ratio of the directed cycles in the 
circuit graph. This characterization leads to improved algorithms for various retiming 
problems. 

The chapter is structured as follows. Section 3.1 introduces basic definitions that 
are used throughout the chapter. Section 3.2 gives an ezract characterization of the 
minimum feasible clock-period for unit-delay circuits and Section 3.3 gives a range of 
D values for the minimum feasible clock-period of general circuits, where D is the 
maximum propagation delay of the circuit components. The previous ranges known 
for both cases had O(V*) values. 

The algorithmic implications of the minimum feasible clock-period characteriza- 
tions are given in the three subsections of Section 3.4. Section 3.4.1 gives an O(E) 
algorithm for minimum clock-period pipelining of unit-delay combinational circuitry. 
The running time of this algorithm is optimal within a constant. An O(Elg D) algo- 
rithm for minimum clock-period pipelining of general combinational circuitry is also 
presented in this section. Section 3.4.2 presents an O(min{V'/?Elg(VD), VE}) algo- 
rithm for minimum clock-period retiming of unit-delay circuitry, and an O(VE lg D) 
algorithm for minimum clock-period retiming of general circuitry. Finally, Section 3.4.3 
gives an O(min{V!/?F le(V WV) le( VD), VE lg(VD)}) algorithm for determining a re- 


timing of a general circuit such that the clock-period is approximately minimized. 
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3.1 Preliminaries 


In this section we give some basic definitions that we will use throughout the rest of 
the chapter. 
Let G = (V, E,d, w) be acircuit graph. We denote by D the maximum propagation 


delay of the circuit components: 


D=max{d(v) : ve V}. 


We define the delay-to-register ratio R(C) of acycle C =v 354... pas Ve-1 > 


vg in the circuit G as follows: 


» dv) 
R(C) = vEC 


y w(e) 


e€C 


We denote by C*(G) the directed cycle in G with maximum delay-to-register ratio. 
By definition R(C*(G)) > R(C’) for every cycle C € G. 

A clock-period c¢ is called feasible for the circuit G if and only if there exists a 
retiming r of G such that 0(G,) <c. Finally, we denote by ®nin(G) the clock-period 


of the retimed circuit G’, with the smallest possible clock-period: 


®nin(G) = min{®(G,) : ris a retiming of G}. 


3.2. Minimum Period for Unit-Delay Circuits 


In this section we relate the minimum clock-period ®,,j,(G), that we can obtain by 
retiming a given unit-delay circuit G = (V,E,1,w), with the maximum delay-to- 
register ratio R(C*(G)) of the cycles C in the circuit graph G. Specifically, we show 
that ®nin(G) = [R(C*(G))}]. 

The result presented in this section relies on a retiming theorem in [17], which 
gives a characterization of when a unit-delay circuit has a clock-period less than or 
equal to c. The theorem is phrased in terms of the graph G — 1/c, which is defined as 
G—-1/ce=(V,E,d,w’) where w'(c) = w(e) — 1/c for every edge e € E. Thus, G—1/e 
is the graph obtained from G by subtracting 1/e from the weight of each edge in G. 
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Theorem 3.1 Let G = (V,F,1,w) be a unit-delay synchronous circuit, and let c be 
any positive integer. Then there is a retiming r of G such that ®(G,) < ¢ if and only 


if G—1/c contains no cycles having negative edge-weight. a 


We can use Theorem 3.1 to characterize the minimum clock period ®,i,(G) in 


terms of the maximum delay-to-register ratio R(C*(G)). 


Theorem 3.2 Let G = (V, E,1,w) be a unit-delay synchronous circuit with mazimum 
delay-to-register ratio R(C*(G)). Let ®min(G) denote the minimum clock-period that 


can be obtained by retiming G. Then 
® nin(G) = [R(C*(G))] . 


Proof: According to Theorem 3.1, a clock-period c is feasible if and only if G—1/c has 
; A a ‘ e€9 e1 €k-2 €k-1 

no negative-weight cycles. Thus, for every cycle C = v9 3 v1 @ ... > VE-1 > UO 

in G and any feasible clock-period ¢ we have: 


k-1 


Y> (w(ei) — 1/e) 2 0. 


i=0 


Equivalently: 
k-1 


c>k/ >> w(e). 
7=0 


The right hand side of the last inequality equals R(C), by definition, and since this in- 
equality holds for every cycle C € G' it must also hold for C*(G). Thus ¢ > R(C*(G)). 
Now, the integrality of c implies that c > [R(C*(G))], and since ¢c > ®nin(G) > 
[(C*(G))] for every feasible period c, we have that ®min(G) = [R(C*(G))]. Oo 


3.3. Minimum Period for General Circuits 


In this section we relate the minimum clock-period ®,in(G), that we can obtain by 
retiming a given general circuit G = (V,&,d,w), with the delay-to-register ratios of 
the cycles in the circuit graph G and the propagation delays of the circuit components. 


Specifically, we show that 


[R(C*(G))] < ®nin(G) < [R(C*(G))] + D, 
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where D denotes the maximum propagation delay of the elements in the circuit, and 
C*(G) denotes the cycle in G with maximum delay-to-register ratio R(C*(G)). Ob- 
serve that both the lower and the upper bound are independent of the size of the 
circuit. 

There is no counterpart of Theorem 3.1 known for general circuits. This is the 
reason why we cannot obtain an exact characterization of ®,,;,(G) for general circuits 
in a manner similar to that of the previous section for unit-delay circuits. However, 
we are still able to give tight bounds for ®,,:,(G@), which are independent of the size 
of the circuit. 

The next theorem gives a necessary condition for a circuit to have a clock-period less 
than or equal to c, and will be used to derive a lower bound for ®min(G). The theorem 
is phrased in terms of the graph G — d/c, which is defined as G — d/c = (V, E,d,w’), 


where w'(e) = w(e) — d(v)/c for every edge uS ve E. 


Theorem 3.3 Let G = (V,F,d,w) be a synchronous circuit, and let c be any positive 
integer. If there is a retiminy r of G such that ®(G,) < ¢ then G — d/c contains no 
cycles having negative edge-weight. 
Proof: Assume there exists a retiming r of G such that ®(G;,) < ce. Consider any cycle 
C €G,. For every register-free path p = v9 3 v, 4... a3! Vk-1 = v, in the cycle 
we have )v*_)d(v;) < ¢. Let w,(C) = deec Wr(ei) be the number of registers in C. 
Then, by adding the contributions from the w,(C) register-free paths in C’, we get 
es d(v;) < e > wr(e;) 
vEC e,EC 


or, equivalently, 


XS w,(e;) — 3 d(v;)/¢ > 0. 


eEC UEC 
Now, recall that w,(e) = w(e) + r(u) — r(v) for every edge u = v € C. Consequently, 
the sum over the edges in C’ telescopes, yielding 
we) — y d(v;)/¢e > 0. 
EEC uEec 
Since this statement is true for every cycle C € G, we conclude that G — d/c contains 


no cycles with negative edge-weight. Oo 
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As a direct consequence of Theorem 3.3, we have the following lower bound on the 


minimum feasible clock-period of a general circuit: 


Corollary 3.4 Let G = (V, E,d,w) be a synchronous circuit with mazimum delay-to- 
register ratio R(C*(G)), and let ®min(G) be the minimum clock-period we can obtain 
by retiming G. Then 

[#(C"(G))] < @min(G). 


Proof: For any feasible clock-period c, Theorem 3.3 implies 


> (w(e) - d(v)/e) > 0 
usveC 

for every cycle C € G. Equivalently, c > R(C) for every cycle C € G, which yields 
ce > R(C*(G)) for C = C*(G). Since this lower bound holds for every feasible clock- 
period, we have R(C*(G)) < ®nin(G). Given that the propagation delays of the 
circuit components are integers we infer that ®,in(G) must be an integer as well. 
Therefore [R(C*(G))] < ®min(@). oO 

Observe that the converse of Theorem 3.3 is not true. Specifically, given a circuit 
G =(V, E,d,w), if G-d/c has no negative weight cycles it does not follow that there 
exists a retiming r of the circuit such that ®(G,) <c. The validity of this statement 
can be demonstrated most easily with the help of an example. Consider the circuit of 
Figure 3.1, which is configured as a ring with three registers and four computational 
elements. It is impossible to get a retiming with clock-period c = 3, even though 
R(C*(G)) = 3, since there is only one register available to be placed among the three 
elements of delay 2. 

Even though the converse of Theorem 3.3 is not true, we can still find an upper 
bound for ®mjn(G) in terms of the maximum delay-to-register ratio R(C*(G)) in the 


circuit and the maximum propagation delay D of the circuit components. 


Lemma 3.5 Let G = (V,F,d,w) be a synchronous circuit with mazimum delay-to- 
register ratio R(C*(G)), and let ®in(G) be the minimum clock-period we can obtain 
by retiming G. Then 

Prmin(G@) < [R(C™(G))] + D. 
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Figure 3.1: A synchronous circuit G with three registers and four computational ele- 
ments. The propagation delay of each element is indicated in the vertex which repre- 
sents it. The circuit cannot be retimed to have period ¢ = 3, even though G-d/c has 
no negative weight cycles. 


Proof: We will prove that ®,,.3n(G') < [R(C*(G))] + D by showing that G can be 
retimed to have clock-period c = [R(C*(G))] + D. 

According to the mathematical programming formulation of retiming, which was 
given in Theorem 2.1, the circuit G can be retimed to achieve period c if we can find 


a set of values r(v) such that 
r(v) — r(u) < w(u > v) (3.1) 
for every edge u— ve E, and 
r(v)— r(u) < W(u,v)- 1 (3.2) 
for all vertices u,v such that D(u,v) > c. Let 
Ew = {u—v : u,v€ V, r(v) — r(u) is constrained by (3.2)}. 


The constraint sets (3.1) and (3.2) induce the constraint graph G, = (V, EU Ew, we), 


where 


wusrv) wu sveb#, 
welu— v= 
W(u,v)-1 uave Ep. 

According to Lemma 2.2 and Lemma 2.3, the circuit G can be retimed to achieve 
clock-period ®(G,) < c exactly when G, has no negative weight cycles. Let us assume 
for the sake of contradiction that G, does have a negative weight cycle C7 € G., 
which consists of two sets of edges /| and £4, with E} C BE, FE C Ew, |F{| = m1 and 
|£5| = ng. Since the edge-weights are integral we have 


S> w(e)+ D> w.(e) < -1. (3.3) 


cE Ey e€ BS 
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Let 
P 
Ej = {vy > vo so 2 € E, yy vp Cu v, u> VE E}}, 


where u 53 v denotes the critical path in G from u to v. Then, according to inequal- 


ity (3.3), we have: 


d wle)+ D5 w(e) 


a w(e) + > w(e)— ng +N 


e€E} e€Ey eC Et ec Ey 
= YS w(e)t+ > w-(e) + r2 
e€ Et e€E, 
= > w-(e) + > we(e) + ne 
e€E} e€E} 
< n2— 1. 


Now, for the delay-to-register ratio of the cycle which consists of the edges Fy U E£4 in 


G we have : 
> LU Sueny d(u) 
= n2- 1 
2 n2(c — D) 
na- 1 
= —?_[R(C(G))] 
- n2—- 1 
nz = 
> . 
2. Fe p Re (G)) 


Since ng/(nz — 1) > 1, we conclude that there exists a cycle in G with delay-to- 
register ratio greater than the maximum delay-to-register ratio R(C*(G)), which is a 
contradiction. Therefore, G. has no negative weight cycles and c = [R(C*(G))] + D 
is a feasible clock-period. Consequently ®nin(G) < [R(C*(G))|] + D. Oo 

Corollary 3.4 and Lemma 3.5 imply the following. 


Theorem 3.6 Let G=(V,E,d,w) be a synchronous circuit with mazimum delay-to- 
register ratio R(C*(G)), and let ®nin(G) be the minimum clock-period we can obtain 
by retiming G. Then 

[R(C*(G))| < Omin(G) < [R(C*(G))] + D. a 
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Running Time 
O(E) 
O(Elg D) 


Circuit Type Transformation 
UD Combinational | Min Clock-Period 
Combinational Min Clock-Period 


O | min 


UD Sequential Min Clock-Period VE 


V2 lg(VW) \) 


Sequential Min Clock-Period O(V Elg D) 
wer ae _ | VE Ig(VW)lg(VD) 
Sequential Approx Min Clock-Period | O | min VElg(VD) 


Figure 3.2: Summary of problems and running times of corresponding algorithms. The 
initials UD denote unit-delay circuitry. 


3.4 Algorithmic Implications 


In this section we study the algorithmic implications of the minimum clock-period 
characterization for a variety of rctiming problems. We use the ideas of the previous 
sections to develop fast algorithms for minimum clock-period pipelining of combina- 
tional circuitry. We show how to obtain improved running times for clock-period 
minimization of sequential circuits, using known graph-theoretic algorithms. Finally, 
we give a faster algorithm for approximate clock-period minimization of general se- 
quential circuits. The problems listed in this section along with the running times of 


the corresponding algorithms are illustrated in Figure 3.2. 


3.4.1 Minimum clock-period pipelining 


We use the ideas of the previous sections to develop fast algorithms for minimum 
clock-period pipelining of combinational circuitry. Specifically, we give an O(E) op- 
timal algorithm for minimum clock-period pipelining of unit-delay combinational cir- 
cuitry and an O(£ lg D) algorithm for minimum clock-period pipelining of general 
combinational circuitry. 

Let us consider unit-delay circuitry first. The problem of minimum clock-period 
pipelining is defined as follows: Given a unit-delay combinational circuit G = (V, E,1,0) 
and a positive integer |, determine a retiming r such that G, is a pipelined combina- 
tional circuit with latency no greater than l and with minimum clock-period. The fol- 
lowing lemma characterizes the minimum feasible clock-period in terms of the longest 


propagation delay A of a path in the circuit and the latency I. 
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Lemma 3.7 Let G = (V,E£,1,0) be a unit-delay combinational circuit with input 
interface vy and output interface vo. Let A be the number of vertices in the longest 
path pa = vy ~ vo in G, and let | be a positive integer. Then the minimum clock- 
period ®min(G) for any pipelined version of G with latency | is 

Proof: Any retiming r of the circuit that gives a pipelined version of the circuit with 
latency I satisfies constraints (2.1) and (2.2) as well as a latency constraint. Specifically, 
it satisfies 

r(v) —r(u) <0 


for every edge u > v in E, 
r(v) —r(u) < -1 


for all vertices u,v € V such that D(u,v) > c, and 
rvs) — r(vo) <b. 


This set of inequalities induces the constraint graph G, = (Ve, F<, w-) and accord- 
ing to Lemma 2.2 and Lemma 2.3 it is feasible if and only if there exists no negative- 
weight cycle in G.. We shall use this statement to show that ®min(G) = [A/(1+1)]. 

First we show that [A/(/+1)] is a lower bound for ®nin(G). Let r be a feasible 
retiming of the circuit with latency / and clock-period c. Every path in G, from vy 
to vo has 1 + 1 register-free parts. Consider the longest such path pa with delay A. 
Adding up all the contributions yields A < c(/+ 1), which implies ¢c > A/(i+ 1). 
Therefore, ®min(G) > A/(1+ 1), or ®nin(G) > [A/(1+1)], since ®min(G) must be 
an integer. 

Now, we prove that [A/(/+1)], the lower bound of ®min(G), is a feasible clock- 
period, thus establishing the desired equality. In order to prove feasibility of the lower 
bound it suffices to show that G, has no negative-weight cycles for c = [A/(/+ 1)]. 


Equivalently, since the maximum number of —1 edges in any path is 
sim] 
fA/(E+ 1]? 


it suffices to show that 
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We have 
A-1 A-1 
lime + mI = es in 
LU + 1A - D/A] 
= ((1+1)0-1/A)| 
41-4 0/4] 
ea 


since (1+ 1)/A < 1. Therefore, G, has no negative-weight cycles, which implies that 
[A/(i+1)] is a feasible clock-period in addition to being a lower bound for ®min(G). 
Therefore ®,,in(G) = [A/(/+ 1)]. Oo 

Now, we give the following algorithm for the problem of minimum clock-period 


pipelining. The correctness of the algorithm follows from Theorem 2.11 and Lemma 3.7. 


Algorithm UDMPP (Unit-Delay Minimum Period Pipelining) Given a unit-delay 
combinational circuit G = (V, #,1,0) with input interface vy and output interface vo, 
and a positive integer /, determine a retiming r such that G, is a pipelined combina- 


tional circuit with latency / and minimum clock-period. 
1. Determine the number of vertices A in the longest path pa in G from v; to vo. 


2. Run Algorithm MLP on G with clock-period [A/(!+ 1)]. Qa 


The algorithm terminates in O(£) steps, since step 1 is a depth-first-search in the 
graph and Algorithm MLP runs in O(£) steps. 

Now, we consider the case of general combinational circuitry. The problem of 
minimum clock-period pipelining is defined in an analogous way: Given a unit-delay 
combinational circuit G = (V,E,d,0) and a positive integer |, determine a retiming 
r such that G, is a pipelined combinational circuit with latency no greater than | and 
minimum clock-period. The following lemma characterizes the minimum feasible clock- 
period in terms of the delay A of the longest path in the circuit, the latency J, and 


the longest component delay D. 


Lemma 3.8 Let G = (V,E,d,0) be a combinational circuit with input interface v; 


and output interface vg. Let A be the delay of the path pa = vy ~ vo inG with 
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the longest propagation delay, and let 1 be a positive integer. Then the minimum 
clock-period ®min(G) for any pipelined version of G with latency | satisfies: 


A . A 
eo | oe . << fe 
il < ®nin(G) < Fest + D, 


where D is the longest component delay in the circutt. 


Proof: Any retiming r of the circuit that gives a pipelined version of the circuit with 
latency / satisfies constraints (2.3) and (2.4) as well as alatency constraint. Specifically, 
it satisfies 

r(v)—r(u) <0 


for every edge u = v in E, 
r(v)— ru) <1 


for all vertices u,v € V such that D(u,v) >, 
rv) — r(vo) < L. 


First, we derive the lower bound of the inequality. Consider the constraint graph 
G, induced by the above constraints. Let r be a feasible retiming of the circuit with 
latency | and clock-period c. Every path in G, from v; to vo has | + 1 register-free 
parts. Adding up all the delays of the register-free parts along the longest such path 
pa yields A < c(1+ 1), which implies c > A/(1+1). Therefore, ®min(G) > A/(1+ 1), 
or ®nin(G) > [A/(1 + 1)], since ©,,;,(G) must be an integer as a consequence of the 
fact that d(v) € Z for every vertex v EV. 

Now, we establish the upper bound of the inequality by proving that [A/(/+1)]+ 
D is a feasible clock-period. In order to achieve this it suffices to show that G, has no 
negative-weight cycles for c = [A/(!+1)]. The maximum number of —1 edges in any 


path is 


warren oy—ol = rarer] 
(TA/(U+1)}+D)-DI LfAs@+ 1) 
But we have already shown in Lemma 3.7 that 
A-1 
2 Lael: 


Hence G, has no negative-weight cycles. Oo 
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We give an O(F lg D) algorithm for minimum period pipelining of combinational 


circuitry. Its correctness follows from Theorem 2.11 and Lemma 3.8. 


Algorithm MPP (Minimum Period Pipelining) Given a combinational circuit G = 
(V, E,d,0) with input interface vy and output interface vg, and a positive integer J, 
determine a retiming r such that G, is a pipelined combinational circuit with latency 


land minimum clock-period. 
1. Determine the delay A of the longest path pa in G from vz to vo. 


2. Binary search among the D possible values of ®,in(G) applying Algorithm MLP 
on G. a) 


Step 1 is a depth-first search in G and Step 2 performs O(lg D) applications of Algo- 
rithm MLP. Therefore, Algorithm MPP terminates in O(F lg D) steps. 


3.4.2 Minimum clock-period retiming 


In this section we study the implications of the minimum period characterization for 
retiming of sequential circuitry. Specifically, we consider the problem: Given a sequen- - 
tial circuit G = (V, E,d,w), determine a retiming r such that ®(G,) is minimum. 
We consider unit-delay circuitry first. In order to compute the minimum feasible 
period of the circuit we can use Karp’s O(V £) algorithm for finding minimum mean 
cycles in a graph [11]. Then, using Bellman-Ford’s shortest-paths algorithm on G—1/c 
we can find a retiming 7 such that ®(G,) is minimum, according to Theorem 3.1. The 
overall running time is O(V £), which is an improvement over the best previously 
known strongly polynomial algorithm by alg V factor, since it eliminates the need for 
binary search. Using scaling we obtain an O(V!/*Elg(VW)) algorithm for the same 
problem, where W is the maximum register count among the edges. This algorithm 
utilizes Orlin-Ahuja’s O(V'/*E lg(VIV)) algorithm for minimum mean cycles [20], fol- 
lowed by Gabow-Tarjan’s O(V!/?Elg(VW)) scaling algorithm for shortest-paths [8]. 
For general circuits we obtain an O(VElg D) running time by binary searching 
with the general retiming algorithm described in [17] the range of the D possible 
values for the clock -period of the circuit. An interesting open question is whether we 


can obtain a better running time by using scaling. 
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3.4.3. Approximately minimum clock-period retiming 


In this section we give an algorithm for determining a retiming of a general circuit 
such that the clock-period is approximately minimized. Specifically, we consider the 
following problem: Given a sequential circuit G = (V, E,d,w) determine a retiming r 
such that ®(G,) < ®nin(G)+ D < 2®mnin(G), where D is the maximum propagation 
delay of the circuit components. We show that using scaling this problem can be solved 
faster than minimum clock-period retiming by a factor of V'/?/(Ig(VW) lg(VD)). 

The algorithm for approximately minimum clock-period retiming is based on the 
lemma that follows. We denote by G — d/c the graph with vertex set V, edge set F 
and edge weight w(e) — d(v)/c on each edge u > v € E. 


Lemma 3.9 Let G = (V,E,d,w) be a circuit graph with mazimum delay-to-register 
ratio R(C*(G)) and let ®,,;,(G) be the minimum clock-period we can obtain by retim- 
ing G. Moreover, let n = [I(C*(G))], and let I(v) be the solutions of a single-source 
shortest-paths problem on G—d/n. Then, the assignment r(v) = [I(v)] for each vertex 


v€V is a retiming of G such that 
©(G,) < ®min(G) + D. (3.4) 


Proof: Note that the shortest-paths lengths /(v) are well-defined, since G—d/ [R(C*(G))] 
has no negative-weight cycles. In order to prove that r(v) = [/(v)] is a legal retiming 
with clock-period ®(G,) < ®nin(G) + D, we show that it satisfies constraints (2.1) 
and (2.2) with ¢ = [2(C*(G))| + D. Then, we conclude inequality (3.4) directly from 
Corollary 3.4. 

First, we prove that r(v) = [l(v)] for each v € V satisfies constraints (2.1). For 


€ 
every edge u — v we have : 


[Cv] — FU(u)] < [l(v) — (u)] 
< [w(e) — d(v)/n] 
[w(e)] 


w(e), 


IA 


II 


since [z — y] < [2] — [y] for every real 2, y, and w(e) is an integer. Therefore, [/(v)] 


satisfies (2.1). 
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Now, we prove that the assignment r(v) = [/(v)] for each v € V satisfies con- 
straints (2.2). Consider any path p = u 3 uy 4... “eS? uy, 83’ ug with delay 


Do d(u;) > c. For this path we have: 
k-1 k 
I(ug) —1(uo) < (x wle;)) = bs cal) 
E d(ui d(uo 
(E42) +m 


i=0 


) 
) 

be v(«)) 2 eye) 
) 


\| 
F as 
: > 
Mi 
a 
€ 
a. 
& 
a 
| 


IA 


_ [R(C(G))1 + D + 1 — d(uo) 
[R(C*(G))] 


1A IA 
= : > 
Mi iM 
me iad 
& = 
~~ — 
& i) 
ma See 
| 
— 


since D+ 1-—d(uo) > 1. Therefore, 
[4(ux)] — [£(uo)] << [l(ug) — L(uo)] 
k-1 
= |e) 
1=0 


kel 
(x: w(«)) =i; 


t=—0 


which implies that [l(v)] satisfies constraints (2.2). 

Therefore, the assignment of lead [/(v)] to each vertex v € V yields a legal retim- 
ing with clock-period ®(G,) < [R(C*(G))] + D. From Corollary 3.4 it follows that 
®(G,) < ®min(G) + D. O 

The algorithm for approximately minimum clock-period is based on Lemma 3.9 


and it proceeds as follows: 


Algorithm ApproxCPM (Approzimate Clock-Period Minimization) Given a circuit 
G = (V,E,d,w) with maximum delay-to-register ratio R(C*(G)), minimum feasible 
clock-period ®,,;,(G) and maximum component delay D, determine a retiming r of 


the circuit, such that ®(G,) < [R(C*(G))| + D < ®nin(G) + D. 


1. Compute n = [R(C*(G))] by binary searching in the range [1,..., VD]. 
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2. Let [(v) be the lengths of the shortest-paths in G—d/n from some source vertex 


seéeV. 
3. Set r(v) = [i(v)], for every vertex v EV. Qo 


Step 1 of the algorithm binary searches for the smallest integer n, which exceeds the 
maximum delay-to-register ratio R(C*(G)). This ratio is positive and cannot exceed 
V D, since the maximum propagation delay of the circuit components is D and since the 
longest simple path in the circuit has at most V vertices. Each one of the O(lg(V D)) 
iterations of the binary search checks for negative-weight cycles in G—d/n. The value 
of [R(C*(G))] equals the smallest integer m in the range that induces no negative- 
weight cycles in G—d/n [12]. Negative-weight cycles can be detected in O(V E) steps, 
using Bellman-Ford’s algorithm [12], or in O(V1/*Elg(VW)) steps, where W is the 
maximum register count along any connection in the circuit, using Gabow-Tarjan’s 
shortest-paths algorithm [8]. Step 2 requires a single-source shortest-paths algorithm 
and Step 3 terminates in O(V) steps. Step 1 of the algorithm dominates the total 
running time yielding an O(min{V!/°E lg(VW) lg(VD),VElg(VD)}) running time 
overall. 

In summary, in this chapter we presented a novel and concise characterization of 
the minimum clock-period, that can be obtained by retiming a synchronous circuit 
G, in terms of the maximum delay-to-register ratio of the cycles in the circuit graph 
and the maximum propagation delay of the circuit components. Based on the ideas 
behind this characterization, we gave an optimal algorithm for optimal pipelining of 
unit-delay combinational circuitry and an efficient algorithm for optimal pipelining of 
general combinational circuitry. We also gave improved algorithms for minimum clock- 
period retiming of unit-delay and general circuitry. Finally, we described a technique 
which yields a retiming with clock-period that does not exceed the minimum by more 
than a factor of 2 and is asymptotically faster than the known algorithms for minimum 


clock-period retiming. 


Chapter 4 


The Closed Semiring Structure 
of Retiming 


This chapter investigates group-theoretic properties of retiming on unit-delay circuitry. 
Specifically, we show that retiming of unit-delay circuitry can be described in terms 
of a closed semiring. The three sections of this chapter are organized as follows. In 
Section 4.1 we review the notion of a closed semiring. In Section 4.2 we construct the 
closed semiring, that captures the structure of unit-delay circuitry retiming. Finally, in 
Section 4.3 we utilize the additive and multiplicative operations of the closed semiring 


in order to design an O(VE) algorithm for unit-delay circuitry retiming. 


4.1 Preliminaries 


In this section we review the notion of a closed semiring. A more detailed exposition 
can be found in [4]. 
Let S be a set of elements, and let 6 and ® be binary operations on S. A system 


(S, @, ®, 0, 1) is a closed semiring if it satisfies the following properties: 
1. (S$, @,0) is a monoid: 


e Sis closed under @: a@beE S forall abe S. 
¢ Dis associative: (aGb)Gce=aG(bGc) for all a,b,ce S. 


e Ois an zdentity element for @: a@0 =a for allae S. 
2. ($,@,1) is a monoid: 
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e Sis closed under ®: aQb€ES forall a,be S. 
e @ is associative: (4 ®@b)®@c=a@(b@c) for all a,b,ce S. 


e lis an identity element for @: a@1=a for allae S. 
3. Ois an annihilator. O®a =O for allae S. 
4. ® is commutative. a@b=b Ga for all a,be S. 
5. ® is idempotent: a@a=aforallae S. 
6. ® distributes over @: a& (b@c) = (a@b)@(a@c) for all a,b,c Ee S. 


7. For any infinite, countable sequence aj,...,@;,...thesum a, ®@...@aj@... 
exists and is unique. Associativity, commutativity, idempotence applies to finite 


as well as infinite sums. 


8. © distributes over countably infinite sums. 


4.2 The Closed Semiring Construction 


In this section we present the closed semiring construction which captures unit-delay 
retiming on the original circuit graph. 


We define the set S as follows: 
S={(r,d): réEN, de {0,1,...,c—1}} U {~w, oo}. 
We denote the additive operation by MIN and define it as follows: 


(ri, d,) MIN (ra, do) = (min{ry, 72}, [(di, dz 5 71,72 )), 


where 
: ry Wry <r 
min{7,,r2} = ue 
rq if ry > 193 
and 
dy ifr, < T2,5 
I(dy,d23 11,72) = 4 max{d,,d2} ifr, =7re, 
dy if ry > 12. 


We denote the multiplicative operation by © and define it as follows: 


(1, d;) iS. (1%, d2) — (ry + i Tees (dy + dy) div Cy dy +, dz), 


40 CHAPTER 4. THE CLOSED SEMIRING STRUCTURE OF RETIMING 


where 
d, + =| : 


’ 
Cc 


(dj + dz) dive = 


(d; + d2) mod c if d, and dy are finite, 
oo if dy or dz is infinite; 


ds toda={ 


and + is the ordinary addition between integers. 
The identity element for the additive operation is 0 def (co, co). 


The identity element for the multiplicative operation is 1 def (0,0). 
Theorem 4.1 The system (5, MIN, ©,0,1) is a closed semiring. 


Proof: We prove the theorem by showing that the system satisfies all the properties 
of a closed semiring. 


1. (S$, MIN, 0) is a monoid. The following properties hold: 
e Closedness under MIN. Obvious. 
e Associativity of ©. We must show that 
((r1,d1) MIN (72, d2)) MIN (73, d3) = (71, d1) MIN ((r2,d2) MIN (rs, d3)). 
(4.1) 


The left hand side of equation (4.1) can be rewritten as 
(min{{ri,r2},r3}, 1(1(dy, d2 5 71,72), dg ; min{ri, re}, 73) ). 
The right hand side of equation (4.1) can be rewritten as 
(min{ry, {r2, 73}}, L0 di, [(da, ds 5 72,73); 71, min{r2, r3}) ). 


Since min{{r1, 72}, 73} = min{ry, {r2,73}}, the first coordinates of the two sides 


are clearly equal. For associativity to hold, it remains to show that 


I(I(dy, da 3 71, 72),d3 ; min{ry,re},r3) = I(d1, I(da,d3 3 72,73); 71, min{re, 7r3}). 

(4.2) 
Applying the definitions of the operations to both sides of equation (4.2) we 
obtain the same expression: 


d; i <i te 
max{d;, d;} i tis t; < Tk, 
max{dj, do, d3} if Ty] =.= 73, 


for distinct 7,7,4 € {1,2.3}. Therefore, MIN is associative. 
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e O is an identity element. We have 


(r,d) MIN (00,00) = (00,00) MIN (r,d) 


(min{r, 00}, I(d, oo ; rT, «)) 


(r, d). 


2. (S,©,1) is a monoid. The following properties hold: 
e Closedness under ©. Obvious. 
e Associativity of ©. We must show that 
((2,b) © (7,4)) © (e, f) = (a,b)  ((7, 4) 0 (e, f)). (4.3) 


If one of the pairs equals (oo, 00) the relation holds. In general, now, the left 


hand side of equation (4.3) equals 
(at+trt+e— (b+d)dive — (f+(b64+d)modc) dive,(b+d+ f) mod c) 
and the right hand side of equation (4.3) equals 
(atrt+e-— (ft+d)dive — (6+(f +d) modc)dive,(b+d+ f) modc). 
In order to prove associativity it remains to show that 
(b+ d)dive+(f+(b+d)modc)dive = (f+d)dive+(6+(f+d)modc)dive. (4.4) 
The left hand side of equation (4.4) can be written as: 


(6+ d) divc+(f+(b4+d) mod c) dive 
ea 4 fiers mod | 
¢ c 
b+d—(b+d) eer jee 
¢ c 
7 are mae fee 


Cc c 
ested 


C 


{| 


Similarly, the right hand side of equation (4.4) can be shown to be equal to 


[(b+ d+ f)/c|. Therefore, © is associative. 
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e 1 is an identity element. We have 
(r,d)©(0,0) = (0,0)0(r,d) 
d 
(7 +0—-|-J,d +. 0) 
itd), 


Il 


3. 0 is an annihilator: We have 


(00,00) @(r,d) = (r,d)© (00,00) 
= (coo+r7,cot+, d) 
= (00,00). 
4. MIN is commutative: We have 
(a,b) MIN (r,d) = (min{a,r},I(b,d; a,r)) 
= (min{r,a},J(d,b; r,a)) 
= (r,d)MIN (a,b) 


since it is clear that [(b,d; a,r) = I(d,b; r,a). 


5. MIN is idempotent: By a simple application of the definition 
(r,d) MIN (7,d) = Ginter aaa rery) 
= (rd): 
6. © distributes over MIN: We must show that 
(a,b) © ((r,d) MIN (e, f)) = ((a,6) © (7, d)) MIN ((a,6) 0 (e,f)). (4.5) 


For convenience, let (11, l2) = (a,b) © ((7,d) MIN (e, f)) and (Ri, Re) = ((a,b) © 
(r,d)) MIN ((a,b)© (e, f)). Applying the definitions we have that 


Iy = min{a+r,a+e}—J(b+d,64+ f ; r,e) div c, 
Lg = (6+/1(d,f; r,e)) mod c); 
and 


Ri = minfa+r—(b+d) div c,ate—(b+ f) div c}, 
Rp 


I( (b+ d) mod c,(b+ f) mod c¢; a+r—(b4d) divc,a+e-— (b+ f) div c). 
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First we show that Ly = /2). 


[Ty = min{a+r,ate}—1(b+d,b+ f37r,e) div c 
= min{a+r,ate}-—I/((b+d) div c, (6+ f) div c; r,e) 


a+r—(b+d) dive ifr <e, 
= a+r—max{(b+d) divc, (b+ f) dive} ifr=e, 
a+e—(b+f)dive ifr>e 
= min{a+r—(b+d)dive,ate—(b+f) div c} 
= Ry. 


Now, we show that Lz = Ro. 


Ig = (b4+7(d,f; 7r,e)) mod 
= (/(b+d,64+ f; r,e)) mode. 


| 


Consider the following three possible combinations of values for (b + d) div ¢ and 
(6+ f) div c. 
Case 1: (b+d) div c=(b+ f) div c Then: 
[2 = I((b+d) mod c, (b+ f) mod c; r,e) 
= I((b+d) mode, (b+ f) mod c; a+r—(b+4+d) div c, a+e— (b+ f) div c) 
= Ro. 
Case 2: (b+d) dive = | and (b + f) dive = 0. In this case f < d and 
(b+d) mod c < (6+ f) mod c, since (b+d) mode = b64+d—c < b+f = (b+f) modc 


and f,d < c. Now, consider the two possible relations between r and e. 
r<e: In this casea+r—(b+d)dive<a+e-—(b+ f) div c and consequently 


Ig = Ry 
(b+ d) mod ec. 


i 


r>e: In this case 
Ig = (b+f) mode 
= max{(b+d) mod c, (b+ f) mod c} 
= Rg, 
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Case 3: (b+d) div c=0 and (b+ f) div c= 1. Symmetric to Case 2. 


From Cases 1, 2, and 3, we conclude that Lz = R2. Therefore, equality (4.5) holds, 
since (L,, L2) = (41, Rz), and consequently © distributes over MIN. 
7. MIN gives a unique result when operating on countably infinite sequences of argu- 
ments. Also, associativity, commutativity and idempotence applies to finite as well as 
infinite sums, as it can be readily seen from the definitions of the operations. 
8. The multiplicative operation © distributes over countably infinite sums, as we can 
easily demonstrate by a simple induction. 


Items 1 through 8 demonstrate the correctness of the theorem. oO 


4.3 An Algorithm for Unit-Delay Circuitry Retiming 


In this section we give a Bellman-Ford type algorithm for retiming of unit-delay cir- 
cuitry, which operates on the original circuit graph. Specifically, given a unit-delay 
circuit G = (V, F,1,w) and a positive integer c, we determine a retiming r of G such 
that the clock-period $(G',) of the retimed circuit G, satisfies ®(G,) < c. Our algo- 
rithm terminates in O(V £’) steps and it matches the best previously known strongly 
polynomial algorithm for the same problem [17], which, according to Theorem 3.1 is 
obtained by running Bellman-Ford on G — 1/c. 

For the reader’s convenience, we give here, without proof of correctness, the Bellman- 


Ford algorithm on G — 1|/c for unit-delay circuitry retiming. 


Algorithm BF This algorithm, given a unit-delay circuit graph G = (V, F,1,w) and 
an upper bound c for the clock-period, determines a function p : V — R, such that 


the retimed circuit Gy) satisfies the clock-period constraint ®(Gy,}) <e, 
1. For some vertex s € V set p(s) = 0. For all vertices v in V — {s} set p(v) = co. 


2. Repeat V — 1 times: 
For each edge u = v € E set p(v) = min{p(v), p(u) + w(e) — 1/c}. 


3. For each edge u — v € F set w,(¢) = w(e) + [p(u)] — [p(v)]. O 


In our algorithm, the additive and multiplicative operations utilized are the MIN 


and © operations introduced in the previous section. The elements of the set S of the 
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semiring are labels h(v) = (r(v), d(v)) associated with each vertex v of the graph. The 


algorithm proceeds as follows: 


Algorithm R This algorithm, given a unit-delay circuit graph G = (V, E,1,w) and 
an upper bound ¢ for the clock-period, determines a retiming r, such that the retimed 


circuit satisifes the clock period constraint ®(G,) < e. 


1. For some vertex s € V set h(s) = (0,0). For all vertices v in V — {s} set 


h(v) = (co, 00). 


2. Repeat V — 1 times: 
For each edge u = v € FE set h(v) = MIN(A(v), h(u) © (w(e), 1)). 


3. For each edge u > v € EF set w,(e) = w(e) + r(u) — r(v). Oo 


The correctness of Algorithm R is ensured by the following lemma, which shows 
that the operation of Algorithm R on G simulates the operation of Algorithm BF on 
G—1/c. 


Lemma 4.1 Let G = (V,E,1,w) be a untt-delay circuit graph. Let p(v) be the vari- 
ables of Bellman-Ford on G — 1/c for each vertex v € V. Moreover, let (r(v), d(v)) 
be the variables of Algorithm R on G for each verter v € V. If both Algorithm BF 
and Algorithm R relax edges in the same order, then after each relaxation of an edge 


uve & we have p(v) = r(v) — d(v)/c for every verter v EV. 


Proof: The proof is by induction on the relaxations. Let p°(v) and (r°(v), d°(v)) denote 
the values of the variables of Algorithms BF and R respectively before the relaxation of 
an edge x — y. Similarly, let p*(v) and (r*(v), d*(v)) denote the values of the variables 
after the relaxation of an edge z — y. We shall show that if p°(v) = r°(v) — d®(v)/c 
for every vertex v € V and both algorithms relax the same edge t — y € E, then 
p*(v) = r°(v) — d*(v)/c for every vertex v € V after the relaxation. 
Initially, before any relaxation is performed, the statement holds, assuming oo — oo/c = 00 

for every vertex v € V — {s} and since 0 — 0/e = 0 for v = s. Now, let z — y be the 


edge to be relaxed. Then 


(r*(v), d"(v)) = (r°(v), d(v)) 
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for every vertex v # y, and 


r°(y) 
d*(y) 


min {r°(y), r(x) + w(e) - l(a’) + 1y/e|}, 
1 (a’(y), (a'(a) +1) mod c; 1°(y), r(x) + w(e) ~ |(a’(2) + 1)/e]) - 


We consider the following three cases, based on whether r°(y) is smaller than, greater 


than or equal to r°(2) + w(e) — (a(x) + 1)/c]. 
1. r°(y) < 7'(z) + w(e) — (a(x) +1)/c ic In this case the relaxation of edge z > y 
by Algorithm R yields 
(r*(y),d"(y)) = (7°(y), Cy). (4.6) 


Now, we want to find p%(y) due to the relaxation of edge z = y by Algorithm 
BF. Since r°(v) € Z for all vertices v € V, the inequality r°(y) < r°(z) + w(e) - 
(a(x) +1 V/e| implies that: 


ry) < r(x) + w(e) ~ |(a(2) + 1/e] - 1 


r(x) + we) — (d(x) + 1)/e 
p(x) + w(e) — I/e, 


AN 


given that 1+ (a(x) + 1)/c| > (d’(x) + 1)/c, for d’(x) € {0,...,¢c—1}, and | 

that r°(a) — d’(a)/¢e = p(w) by the inductive assumption. We also have that 
p(y) = r°(y)- @(y)/e 

ry). 


lA 


Therefore p°(y) < p(x) + w(e) — 1/c and the relaxation of edge x  y by 
Algorithm BF yields 

p*(y) = py): (4.7) 
From equations (4.6) and (4.7) and the inductive assumption we have that 


p°(y) = r*(y) — d*(y)/c. 


2. r°(y) > r(x) + wle)— (aa. )+1) Ve]. In this case the relaxation of edge x — y 
by Algorithm R yields 


ene) )}= (r(x J+ wle) - (a(x) +1)/c | (a°(x) +1) mod c) . (4.8) 
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Now, we want to find p*(y) due to the relaxation of edge z > y by Algorithm 
BF. Since r’(v) € Z and d°(v)/e < 1 for all vertices v € V, the inequality 


r’(y) > r(x) + w(e) — a )+1)/e | implies that: 
p(y) = - d(y)/e 


> r(x) + w(e) — [(d’(x) + 1)/e] - @?(v)/e 
> r(x) + w(e) — (d°(z) + 1)/e- d(v)/c 
7 ) + we) — Ife. 
Therefore the relaxation of edge a = y by Algorithm BF yields 
p(y) = p'() + w(e) - Ie. (4.9) 


From equations (4.8) and (4.9) and the inductive assumption we have that 
r2(y)—d%(y)/e = r(x) + wle)— [(d'(a) + 1)/e| — ((a*(z) + 1) mod e) /e 
= pix) +d’(x)fe + w(e) — |(a’(x) + 1)/e| — ((a’(z) + 1) mod c) /e 
= pia) + (da) + 1/e— |(a(x) + 1)/e| — ((a*(z) + 1) mod ec) /e 


ple), 


if] 


U 


since 
(d(x) + 1)/e = |(d'(x) + 1)/e| + ((a’() +1) mod e) /e (4.10) 
for d°(x) € {0,...,¢ — 1}, as it can be easily verified by checking the cases for 
d(x) =c— 1 and d(x) < c—1. 
3. r(y) = r°(2) + w(e)— a v)+1)/e ae In this case, the relaxation of edge x > y 
by Algorithm R yields 
(r7(y), d*(y)) = Ge (y), max {d*(y), (d’(x) + 1) mod c}) : (4.11) 
Now, we want to find p%(y) due to the relaxation of edge z = y by Algorithm BF. 
By the inductive assumption and equation (4.10) the equality r°(y) = r°(z) + 
w(e) — ca 2z)+ 1)/c| implies that 
pry) = p(x) + wle) + (d(x) — d(y))fe~ |(d(a) + D)/e| 
= p(x) + we) + (d(x) — d?(y)) fe — (d?(x) + 1)/e+ ((a*(2) +1) mod c) /c 
p’(a) + wle) — l/e+ (( d°(x) +1) mod ¢— d(y)) /c. 


II 
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We consider two cases, depending on the ordering of (d?(z)+1) mod e and d°(y). 


Case A: (a’(xz) + 1) mod ¢ — d®(y) > 0. In this case p’(y) > p®(x) + w(e) — 1/e 
and Algorithm BF yields 


p*(y) = p'(z) + w(e) — 1c. (4.12) 


From equations (4.11), (4.12), and (4.10), and the inductive assumption we have 


that 


r(y)—d(y)/e = r(x) + wle)— (a?(z) )+1)/e| - ((d*(z) +1) mod ¢) /e 


= p(x) +d°(x)/e+ w(e) — |(a*(x) )+1)/e |- ((a°(2) +1) mod c) /c 


= p(x) +d°(x)/e+ w(e) — (d°(z) + 1)/e 
= p(x) +w(e)—1/e 


= p(x). 
Case B: (d’(z) + 1) mod ¢— d’(y) < 0. In this case Algorithm BF yields 
p(y) = p(y). (4.13) 
From equations (4.11) and (4.13) and the inductive assumption we have that 


r(y)—d*(yj/e = r°(y)- d°(y)/e 
p(y) 


p°(y). 
Therefore, we still have r°(y) — d*(y)/c = p*(y). 


From cases 1, 2 and 3 we conclude that if both algorithms relax edges in the same 
order then p(v) = r(v) — d(v)/c for every v € V after each relaxation. QO 
Now, the following theorem shows that the set of values r(v) computed by Algo- 


rithm R yields a legal retiming of the unit-delay circuit G. 


Theorem 4.2 Let G = (V,£,1,w) be a unit-delay circuit graph and let c be a positive 
integer. Also, let (r(v),d(v)) be the variables of Algorithm R on G for each vertex 
v€V. Then, after the termination of Algorithm R, r(v) yields a retiming of G such 
that ®(G,) <e. 


4.3. AN ALGORITHM FOR UNIT-DELAY CIRCUITRY RETIMING 49 


Proof: Let p(v) be the variables of Bellman-Ford on G — 1/c for each vertex v € V. 
From Theorem 4.1 and the facts that r(v) € Z and d(v) < c for every v € V, we have 
that [p(v)] = r(v). This equality and the correctness of Algorithm BF imply that 
r(v) yields a legal retiming of G, such that ®(G,) < ¢. O 


In summary, this chapter exhibits the closed semiring structure of retiming on 
a unit-delay circuit G and demonstrates a Bellman-Ford type algorithm, which uses 
the additive and multiplicative operations of the semiring in order to compute a legal 
retiming of the circuit. The algorithm operates only on the original graph G and its 
running time matches that of the best previously known strongly polynomial algorithm 


for the same problem. 


Chapter 5 


A Mixed-Integer Optimization 
Problem 


This chapter investigates a mixed-integer optimization problem which arises in the 
mixed-integer optimization framework of retiming, as it was introduced in [17]. We 
present a polynomial time algorithm for the problem, that is based on the technique 
of introducing additional constraints, known as cuts, in such a way that the integrality 
constraints of the mixed-integer problem are met by the optimum solution of its linear 


programming relaxation. 


The five sections of the chapter are organized as follows. Section 5.1 reviews the 
problem of finding a minimum-cost flow on a network. It also presents the dual prob- 
lem of a minimum-cost flow and gives optimality conditions which relate primal and 
dual solutions. Section 5.2 introduces the mixed-integer optimization problem that we 
solve in this chapter. The problem is identified as the restricted case of a mized-integer 
dual of an uncapacitated minimum-cost flow, because the relaxation of its integrality 
constraints reduces it to the dual of an uncapacitated minimum-cost flow problem. 
Based on this observation, we develop feasibility and optimality conditions for the 
mixed-integer problem in Section 5.3. Section 5.4 describes an algorithm that solves 
the mixed-integer problem in O(\3lg V) steps. Finally, Section 5.5 gives an applica- 
tion of our algorithm by reducing the problem of state minimization of synchronous 


circuitry to the mixed-integer problem that we solve in this chapter. 
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5.1 Preliminaries 


In this section we give some basic background material on the problem of finding a 
minimum-cost flow in a network. 

A flow network G = (V, ,w,c) is an edge-weighted directed graph in which each 
edge u > v € E has a weight w(e) and capacity c(e) > 0. Let each vertex v € V have 
an associated real value b(v) such that )°,¢y o(v) = 0. A flow in G is a real-valued 


function f: F — R that satisfies the following two properties: 


S> fle)- So fle) = b(u) (5.1) 
vue uve€E 
for all vertices u € V, and 
0< fle) < ele) (5.2) 


for all edges uv € E. 

A flow network G = (V,l’,w,c) with c(e) = oo for all edges e € E is called 
uncapacitated. For simplicity, in the rest of this paper we shall denote an uncapaci- 
tated network by G = (V, F,w). The problem of finding a minimum-cost flow on an 


uncapacitated network G = (V, £,w) is defined as follows. 


Problem UMC-Flow (Uncapacitated Minimum-Cost Flow) Let G = (V,E,w) be 
an uncapacitated flow network. Let each vertex v € V have an associated real value 
b(v) such that So cy 6(v) = 0. A minimum-cost flow f on G' is a flow that minimizes 

as w(e)f(e). a 
uveE 


The linear programming dual of Problem UMC-Flow is defined as follows. 


Problem DUMC-Flow (Dual Uncapacitated Minimum-Cost Flow) Let G = (V, E, w) 
be an uncapacitated flow network. Let each vertex v € V have an associated real value 
b(v) such that S>ycy b(v) = 0. Determine a value 2(v) for each vertex v € V that max- 
imizes as z(v)b(v) subject to 

veV 


a(v) — 2(u) < w(e) (5.3) 


for all edges u =v € F. oO 


52 CHAPTER 5. A MIXED-INTEGER OPTIMIZATION PROBLEM 


Note that there are no integrality constraints on the solutions z. The mixed-integer 
problem that we solve, and that we present in the following section, has the form of 
Problem DUMC-Flow with the addition of integrality constraints on a subset of the 
variables in x. 

The following theorem is a direct consequence of the primal-dual relation of Prob- 
lems UMC-Flow and DUMC-Flow. 


Theorem 5.1 Let f* be a flow that solves Problem UMC-Flow and let Z,(f*) = 
Sve w(e)f*(e). Similarly, let a* be a flow that solves Problem DUMC-Flow and 
let Za(2") = Dyev 2*(v)b(v). Then 2,(f*) = Za(2*). 0 


Almost all algorithms for Problems UMC-Flow and DUMC-Flow rely on Theo- 
rem 5.1 and they usually yield a solution for both problems at the same time. A basic 
concept used in these algorithms is that of the residual network G(f). The residual 
network G(f) corresponding to a flow f is defined as follows: we replace each edge 
u—v€ E by two edges u = v and v Ea Ne edge u — v has cost w(e) and a 
residual capacity r(e) = u(e) — f(e), and the edge v is u has cost —w(e) and residual 
capacity r(e’) = f(e). The residual network consists only of arcs with positive residual 
capacity. 

The following theorem gives a necessary and sufficient solution for a flow f to be 


optimum in terms of the residual network G(f). 


Theorem 5.2 LetG =(V,F,w, u) be a flow network. Then a flow f on G is optimum 


if and only if G(f) contains no negative-weight directed cycles. Oo 


Finally, the following Lemma [2] demonstrates how to obtain an optimum solution 
x for Problem DUMC-Flow once an optimum flow f for Problem UMC-Flow is known. 


Lemma 5.3 LetG =(V,E,w,u) be a flow network and let f* be an optimum flow on 
G. Moreover, let I(v) denote the length of the shortest-path in G(f) from some source 
s€V to verter v EV. Then the assignment z(v) = I(v) for every verter v € V is an 


optimum solution for Problem DUMC-Flow. O 
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5.2 Mixed-Integer Dual Minimum-Cost Flow 


In this section we present the mixed-integer optimization problem that we solve in this 
chapter. We refer to the problem as the restricted mized-integer dual of uncapacitated 
minimum-cost flow and we identify it as a special case of a general mixed-integer 
optimization problem. 

The restricted mixed-integer dual of uncapacitated minimum-cost flow is defined 


as follows. 


Problem RMI-Dual-Flow (Restricted Mixed-Integer Dual of Uncapacitated Minimum- 
Cost Flow) Given an uncapacitated flow network G = (V, E, w) with w(e) € R, aset V7 
such that V; C V, and an integer b(v) for each vertex v € V; such that )°yey, 6(v) = 0 


and b(v) = 0 for all »v ¢ V7, find a value 2(v) for each vertex v € V that maximizes 


Z(r) = Vvev, 2(%)(v) subject to 


x(v) — 2(u) < w(e) (5.4) 

for every edge u > v € E, and 
av) eZ (5.5) 
for every v € V7. oO 


Observe that the maximization of the sum is performed over the subset V; of V, which 
is required to take on integer values. The reason that we identify the problem as the 
mixed-integer dual of an uncapacitated minimum-cost flow is that if we relax the inte- 
grality constraints (5.5) it reduces to Problem DUMC-Flow, the linear programming 
dual of an uncapacitated minimum-cost flow problem [12, 21]. Based on this observa- 
tion, we describe in Section 5.4 an O(VlgV) time procedure, which solves Problem 
RMI-DualFlow. 

We can generalize Problem RM1-Dual-Flow by extending the set over which the 


maximization is performed to include the entire vertex set V of the graph. 


Problem MI-Dual-Flow (Aized-integer Dual of Uncapacitated Minimum-Cost Flow) 
Given an uncapacitated network G = (V,E,w), with w(e) € R, a set V; such that 


V; ¢ V, and an integer b(v) for each vertex v € V such that cy b(v) = 0, find a 
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value z(v) for each vertex v € V that maximizes > cy 2(v)b(v) subject to 
o(v) - 2(u) < w(e) 


for every edge u > v € £, and 
uvjyeZ 


for every v € V7. oO 


We conjecture that, contrary to Problem RMI-Dual-Flow, Problem MI-Dual-Flow 


is not tractable. 
Conjecture. Problem MI-Dual-Flow is NP-Complete. Oo 


Two facts support our conjecture. First, the feasible vectors of Problem RMI-Dual- 
Flow do not form a convex set, due to the integrality constraints. Lack of convexity 
rules out linear programming approaches that lead to polynomial time algorithms [21, 
2]. In addition, the solutions to Problem MI-Dual-Flow do not necessarily exhibit the 
optimal substructure property. There exist instances of Problem MI-Dual-Flow which 
in order to be solved require a locally suboptimal assignment of values to the unknowns. 
The lack of optimal substructure rules out dynamic programming approaches that 


could lead to polynomial time algorithms [4]. 


5.3 Feasibility and Optimality Conditions 


In this section we develop feasibility and optimality conditions for Problem RMI-Dual- 
Flow. Specifically, we construct an auxiliary problem by augmenting the constraint-set 
of Problem RMI-Dual-Flow with new constraints, which are derived from the given 
constraint-set. The auxiliary problem has no explicit integrality constraints and we 
prove that it is feasible if and only if Problem RMI-Dual-Flow is feasible. Finally, we 
prove that a solution of the auxiliary problem solves Problem RMI-Dual-Flow as well. 

First, let us describe how the additional constraints are obtained. Let G = 
(V, E,w) be an edge-weighted graph and let V; C V. We define the short-cut graph 
Gs = (V, Es, ws) as follows. 


Es = {usv: u,v eV, us ve Gh, 


ws(u—v) = minfw(p) : u~+ ve Gh. 
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We also define the dense graph Gp = (V,EU Es,wp) with edge-weights defined as 


follows. 


Bs w(e) ifee E, 
COU GCN it we De, 


The edges in Fs impose the additional constraints of Problem AUX. We define the 
auxiliary problem AUX in terms of the original graph G and its corresponding short- 


cut graph Gs. 


Problem AUX (Auziliary Dual of Uncapacitated Minimum-Cost Flow) Let G = 
(V, E,w) with w(e) € R, be an edge-weighted graph and let Gs = (Vs, Es, ws) be its 
corresponding short-cut graph. Given a set VY; such that V; C V, and an integer b(v) 
for each vertex v € Vy such that Y>ycy, b(v) = 0 and b(v) = 0 for all v ¢ V7, find a 


value z(v) for each vertex v € V that maximizes Z(r) = Dey, 2(v)b(v) subject to 
v(v) — 2(u) < w(e) (5.6) 
for every edge u ve E, and 
z(v) ~ 2(u) < [ws(e)| (5.7) 
for every edge u > v € Es. 


First, we shall prove that feasibility of RMI-Dual-Flow implies feasibility of AUX 
by showing that the set of solutions of RMI-Dual-Flow encompasses all solutions of 
AUX. We denote by Vay; the sct of feasible vectors for RMI-Dual-Flow and by VAyq7 
the set of optimum vectors for RMI-Dual-Flow. Similarly for AUX, we denote its set 


of feasible vectors by Vaux and its set of optimum vectors by Vix. 
Lemma 5.4 [fx € Vpaz then 2 € Vauy. 


Proof: Let « = (2(1),...,a({V1)) be any vector in Vayy. Then, from inequality (5.4) 
we have that x(v) — 2(u) < w(e) for every edge u S v € E. Therefore, z satisfies 
inequality (5.6). 

Also, for every u,v € V7 let p = u~ v be the shortest path in G from u to v. 
By applying inequality (5.4) along p and the definition of the short-cut graph Gg we 


have that 2(v) — 2(u) < ws(e) for uv € Es. Since 2(u) and x(v) are integers from 
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constraint (5.5), we can write 2(v) — 2(u) < [ws(e)| for u > v € Eg. Therefore, x 


satisifies inequality (5.7) as well and consequently z € Vaux. oO 


As an immediate consequence we have the following. 


Corollary 5.5 For any x € Vays; with Z(x) = Vey, 2(v)b(v) and for any y € Xiu x 
with Z(y) = Vrev, y(v)o(v), we have 


Z(x) < Z(y). 


Proof: Since z € Vay, we also have x € Vpmyz. From Lemma 5.4 we infer that 


xz € Xaux and therefore Z(2) < Z(y) for every y € MAX: O 


Now, we shall prove that feasibility of Problem AUX implies feasibility of Problem 
RMI-Dual- Flow: 


Lemma 5.6 [f Vaux FZ O then Vag z Q. 


Proof: From Lemma 2.3 and the definition of Gs and Gp we have that X4ux # 0 
exactly when G's is well-defined and there exists no negative-weight cycle in Gp. Let 
z(v) be the length of the shortest path in Gp from some source s € V to the vertex 


vé€V. Then, from Lemma 2.2, x satisfies 
u(v)— xa(u) < we) 


for every uv € FE, and 
a(v) — v(u) < [ws(e)| 


for every edge u > v € Es. Therefore, x satisfies inequality (5.4). 

Moreover, since for any path p= w~ vin E with u,v € V; there always exists an 
edge u +v€ Ep with wple) < w(p) and wp(e) € Z, the shortest path in Ep from 
the source s to any vertex v € V7) will be on integer-weight edges only, provided s € Vy. 
Thus, by setting z(s) = 0 we can ensure x(v) € Z for all vertices v € Vz. Therefore, z 


satisfies inequality (5.5) as well and consequently @ € Vea. Oo 


As a consequence of Lemmata 5.4 and 5.6, we have the following corollary. 
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Corollary 5.7 Problem RMI-Dual-Flow is feasible if and only if the short-cut graph 


Gg is well-defined and the dense graph Gp has no negative-weight directed cycles. O 


In the remaining of this section we show how to obtain a solution of Problem AUX 
that solves Problem RMI-Dual-Flow as well. 

First, we shall show that there exists a primal solution of Problem AUX which has 
a special structure. Then we demonstrate how we can exploit this special structure 
in order to find a solution for Problem RMI-Dual-Flow. Recall that, according to 
section 5.1, the primal of Problem AUX is an uncapacitated minimum-cost flow on 


Gp = (Vp, Ep, wp). 


Lemma 5.8 Let f be a flow on G'p that solves the primal of Problem AUX. Also, let 
ES(f) ={usveé Ep : fle) > 0}. Then there exists a flow f' on Gp that solves 
the primal of Problem AUX such that 


Proof: Consider an optimum f on Gp with f(e) > 0 for some edge u +vu¢ Es. We 
show that by rerouting flow we can always convert f to a new flow f’ such that Z,(f) = 


et 


Z,(f') and EX(f’) C Es. Since u + v ¢ Eg there exists a path p, = uo a 
.Uk-1 “ES! y with wo E Vy, yy... 5 te-1 & Vr, and f(e;) > 0 for? = 0,1,...k -1, 
and a path pp = v as Y-1 pict ..u, <3 v9 with up € Vi, 1,---, 01-1 ¢ Vr, and 
f(e;) > 0 for i= 0,1,...1— 1. Note that as long as there exists an edge u > v ¢ Ey 
with f(e) > 0, we can always find paths p, and p2 constructed in the way above. If 
there were no such paths, then the node-balance constraints (5.1) would have been 
violated, since b(v) = 0 for every o ¢ V7. 
Now, since ug, vo € V7, and f is optimum, there exists an edge uo + ug € Es with 
wp(er) = wp(pi; e; p2), where p,; 3 p2 denotes the path formed by concatenating py, e, 
and pa. Therefore, we can reroute min{f(e;) : e; € pije; pe} units of flow through e; 


and still maintain an optimum flow. Let f, be the new optimum flow. Then 
EB Sal S IEBA)I - 1. 


Therefore, repetition of this procedure until Ef(fa) A E = @ yields an optimum flow 
f' such that E#(f') C Es. oO 
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Now, we show how we can get a solution for Problem AUX that satisfies the 
integrality constraints of Problem RMlI-Dual-Flow. The proof relies on Lemma 5.8 


above and on Lemma 5.3 of Section 5.1. 


Lemma 5.9 Let f be a solution for the primal of Problem AUX with Ef(f) C Es. 


Then there exists x € Vay y such that: 
ev) eZ 
for allv € Vy. 


Proof: Let dp(f,v) denote the length of a shortest-path in the residual graph Gp(f) 
from a source s € V; to a vertex v € V. From Lemma 5.3 we know that once an 
optimum flow f for the primal of Problem AUX is known, the assignment z(v) = 
dp(f,v) for every vertex v € V yields a solution z to Problem AUX. 

It remains to show that x satisfies e(v) € Z for all v € V;. Let us denote by 
Ip(f,s) the length of a path s in Gp(f) and let p be a shortest-path in Gp(f) from 
the source s € V; to a vertex v € V7. We shall prove that Ip(f,p) = dp(f,v) € Z. 
Let g = 799 Sys... “se? Ub—1 coc vy, be a part of p such that vo,v,p € Vr 


and v,...,v%—-1 ¢ Vr. Since FE(f) C Es, we have that either ej € EU Es for 


all edges e; € p, or that k = | and vw — v, is a backward edge of a flow-carrying 
edge v1; — vo € Es. In the first case Ip(f,q) € Z, since q has to be a shortest-path 
from vo to vz and there always exists an edge e € Eg such that wp(e) < |w(q)]. 
In the second case [p(f,q) € Z, since v1 — vo € Eg implies wp(v, — vo) € Z and 
In(f,q) = —wp(v% — vo) by definition. Therefore, [p(f,q) € Z for every q and 
consequently [p(f,p) € Z. oO 


Now, we can easily infer that the solution of Problem AUX derived according to 


the way suggested in Lemma 5.9 is a solution for Problem RMI-Dual-Flow. 


Theorem 5.10 Let f be a solution for the primal of Problem AUX with EX(f) C Eg. 
Let x be a solution of a single-source shortest-paths problem on Gp(f) from a source 


séV;. Then x is a solution for Problem RMI-Dual-Flow. 
Proof: From Lemma 5.3 we infer that 7 € Vjy;y, and consequently z satisfies con- 
straint (5.6): 

z(v) — a(u) < w(e) 
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for alle € FE. Therefore, x satisfies constraint (5.8) of Problem RMI-Dual-Flow. 
From Lemma 5.9 we have that 2 also satisfies the integrality constraint (5.9) of 
Problem RMI-Dual-Flow. Therefore, x € Vea, which implies that Z(y) > Z(z) 
for every y € Vfas;- But from Corollary 5.5 we have that Z(y) < Z(z). Therefore, 
ze Xpy, as well. Oo 


5.4 The Algorithm 


In this section we give the O(\3 lg V) algorithm that solves Problem RMI-Dual-Flow. 


Its correctness relies on the theory developed in the previous section. 


Algorithm RMI-Dual-Flow This algorithm determines a solution z for Problem 


RMI-Dual-F Low. 


1. Compute the edges in I’s by solving an all-pairs shortest-paths problem on G. 


Fail if a negative-weight cycle is found. 
2. Compute a min-cost flow f on the graph Gp. 


3. Transform f into f’ by rerouting flow in such a way that if f’(e) > 0 then 


uve Es. 


4. Compute the shortest-paths lengths 2(v) for each vertex v in Gp(f') from a 


source ¢ € Vy. a 


Step 1 requires V shortest-paths algorithms. The total cost is O(V(E + Vlg V)) 
using Johnson’s all-pairs shortest-paths algorithm [10]. Step 2 executes one uncapac- 
itated min-cost flow algorithm, which requires O(VlgV(Ep + VlgV)) steps, using 
Orlin’s strongly polynomial algorithm [19]. Step 3 runs for O(V £) time, since each 
rerouting eliminates flow from at least one edge in & and requires O(V) steps. Step 4 
can be implemented in O(V Lp) time, using Bellman-Ford’s algorithm for shortest- 


paths. Therefore, the overall running time is O(V7lgV), 


5.5 An Application to State Minimization 


In this section we present the state minimization problem for retiming from the math- 


ematical programming perspective described in [17], and we give a reduction of the 
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problem to Problem RMI-Dual-Flow. The state minimization problem is defined as 

follows: For a given circuit G = (V, £,d,w), determine a retiming of the circuit such 

that the total number of registers }°.<¢, w,(e) of the retimed circuit is minimized. 
First, we give without proof the following theorem from [17]. This theorem de- 


scribes retiming as a mixed-integer programming problem. 


Theorem 5.11 Let G = (V,E,d,w) be a synchronous circuit, and let c be a positive 
real number. There exists a retiming r of G such that ®(G,) < c if and only if there 
exists an assignment of a real value R(v) and an integer value r(v) to each verter 


v €V such that 


IA 


it(v) — r(v) 
r(v) — R(v) 


—d(v)/e, 


IA 
oa 


for every vertex v EV, and 


r(v)—r(u) < we), 
Riv) — Ru) 


A 
~ 
S 
— 
fon) 
S 
| 
Qa 
— 
io4 
Ww 
™= 
lor) 


wherever u > v. oO 
The number of registers $(G'.) in the retimed circuit G, is 


S(G,) = ss w,(€) 
e€E 
= S> (ue) + r(u) — r(v)) 


= yy we) + Ss (r(u) a r(v)) 


e€k 


usu 


= S(G)+ ye r(v)(outdegree(v) — indegree(v)), 
vEV 


where $(G) is the number of registers in the original circuit. Since $(G) is constant, 


minimizing S(G;,) is equivalent to minimizing the quantity 


a r(v)(outdegree(v) — indegree(v)), 
veV 
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which is a linear combination of the r(v), since (outdegree(v) —indegree(v)) is constant 
for each v. Now, using Theorem 5.11 we can state the register minimization problem 


in its mixed-integer form: 


Problem STMIN (State Minimization) Given a synchronous circuit G = (V, E,d,w) 
and a positive number c, determine a retiming r of G such that ®(G,) < ¢ and 
G, has the minimum number of registers. Equivalently, find an assignment of a 
real value #(v) and an integer value r(v) to each vertex v € V that minimizes 


Dvev T(v)(outdegree(v) — indegree(v)) subject to 


R(v)-r(v) < —d(v)/c, (5.8) 
riv)- Riv) < 1, (5.9) 
for every vertex v € V, and 
rv)—r(u) < w(e), (5.10) 
Riv) -— R(u) < w(e)—d(v)/c, (5.11) 
wherever u — v. Oo 


The state minimization problem on G = (V,£,d,w) can be seen from the per- 
spective of the mixed-integer problem RMI-Dual-Flow on an uncapacitated network 


G' = (V', E’,w’). The graph G’ is defined as follows. 


Vo Se: Yop Pe 2 = 1,2), 
Bo OE Ono Be 


where 


Dy = {vp v2 : 14,02 € Vv}, 
EY = {vu : v1,02 € VS, 
Ee = {m0 : uve E}, 


th 

| 
ns 

a 
w 


=v. 1 urve E}. 
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The edge-weight of each edge e € E’ is 


—d(v)/e if vy S v2 € Ej, 
i = 1 if V2 ge VU € E}, 
wey: w(e) ifusve E%, 


w(e)—d(v)/e ifuSve EF}. 


The unknown r(v) of the state minimization problem corresponds to z(v,) and the 
unknown R(v) corresponds to 2(v2). The function 6 is defined on V’ as 6(v) = 
(indegree(v) — outdegree(v)) for every vertex v; € V, and b(v2) = 0 for every vertex 
v2 EV. Finally, Vy = {v; : 0, € V’}. 

In summary, in this chapter we gave a solution to a mixed-integer optimization 
problem. We identified the problem as the restricted mixed-integer dual of an unca- 
pacitated minimum-cost flow by observing that its linear programming relaxation is 
the dual of an uncapacitated minimum-cost flow problem. Based on this observation 
we developed a theoretical framework for its solution and we gave a procedure that 
solves it in O(V3lgV) steps. Finally, we gave an application of our algorithm by re- 
ducing the state minimization problem for retiming to the mixed-integer problem that 


we solved. 


Chapter 6 


Conclusion 


In this paper we have investigated properties of retiming, a synchronous circuitry 
optimization technique. We presented specialized, fast algorithms for retiming of 
combinational circuitry. Specifically, we showed that combinational circuitry can be 
pipelined with minimum latency in O(I’) steps, which is optimal within a constant fac- 
tor. clock-period minimization of combinational circuitry can be achieved in O( FE lg D) 
steps, where D is the maximum component delay in the circuit. We presented a novel 
and concise graph theoretic characterization of the minimum clock-period of a circuit. 
Based on this characterization we gave improved techniques for minimum clock-period 
retiming of sequential circuitry. We presented an O(min{V1/?E lg(VD), VE}) algo- 
rithm for minimum clock-period retiming of unit-delay circuitry, and an O(VE lg D) 
algorithm for minimum clock-period retiming of general circuitry. We also showed that 
a retiming of a general circuit with clock-period that does not exceed the minimum 
by more than D can be found in O(min{V#/?E le(VW) le(V D), VE lg(V D)}) steps. 
Subsequently, we exhibited the closed semiring structure of retiming and we gave an 
algorithm which operates based on this structure. Finally, we gave an O(Vlg V) 
time algorithm for a mixed-integer optimization problem, which arises in the linear 
programming framework of retiming. 

There are still open questions of both practical and theoretical interest in the area. 
It is an interesting question whether there exists an algorithm for minimum clock- 
period retiming of general circuits that matches the running time of the algorithm for 
the same problem on unit-delay circuitry. Decoupling the running time of our algo- 


rithm for the mixed-integer optimization Problem RMI-Dual-Flow from the number of 
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the new constraints introduced will also be an interesting extension of the techniques 
presented in this thesis. Finally, proving the conjecture that Problem MI-Dual-Flow 
is intractable will fully clucidate the problem of optimizing mixed-integer difference 
constraints. Our conjecture is supported by the fact that the feasible vectors of Prob- 
lem MI-Dual-Flow do not form a convex set as well as by the fact that the solutions to 
Problem MI-Dual-Flow do not necessarily exhibit the optimal substructure property. 
Lack of convexity and optimal substructure rules out linear programming and dynamic 


programming approaches, that could lead to polynomial-time algorithms. 
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